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Abstract 

The average distance from a node to all other nodes in a graph, or from a query point in a metric 
space to a set of points, is a fundamental quantity in data analysis. The inverse of the average distance, 
known as the (classic) closeness centrality of a node, is a popular importance measure in the study 
of social networks. We develop novel structural insights on the sparsifiability of the distance relation 
via weighted sampling. Based on that, we present highly practical algorithms with strong statistical 
guarantees for fundamental problems. We show that the average distance (and hence the centrality) for 
all nodes in a graph can be estimated using 0{e~^) single-source distance computations. For a set V of 
n points in a metric space, we show that after preprocessing which uses 0(n) distance computations we 
can compute a weighted sample S' C V of size 0(e“^) such that the average distance from any query 
point u to V can be estimated from the distances from v to S. Finally, we show that for a set of points V 
in a metric space, we can estimate the average pairwise distance using 0(n+e“^) distance computations. 
The estimate is based on a weighted sample of 0{e~^) pairs of points, which is computed using 0{n) 
distance computations. Our estimates are unbiased with normalized mean square error (NRMSE) of at 
most e. Increasing the sample size by a 0(log n) factor ensures that the probability that the relative error 
exceeds e is polynomially small. 


1 Introduction 


Measures of structural centrality based on shortest-paths distances, first studied by Bavelas Q, are classic 
tools in the analysis of social networks and other graph datasets. One natural measure of the importance 
of a node in a network is its classic closeness centrality, defined as the inverse of its average distance to 
all other nodes. This centrality measure, which is also termed Bavelas closeness centrality or the Sabidussi 
Index lEiiiiiia, was proposed by Bavelas |4|, Beauchamp ||5l, and Sabidussi ll20l . Formally, for a graph 
G = {V,E) with \ V\ = n nodes, the classic closeness centrality of n G V is 


cciv) = -- —7 -^ 

E«eFdist(«,n) 


( 1 ) 


where dist(tt,r;) is the length of a shortest path between v and u in G and n is the number of nodes. 
Intuitively, this measure of centrality reflects the ability of a node to send goods to all other nodes. 

In metric spaces, the average distance of a point z to a set V of n points, '^^^yd\st{z,x)/n, is a 
fundamental component in some clustering and classification tasks. For clustering, the quality of a cluster 
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can be measured by the sum of distances from a centroid (usually 1-median or the mean in Euclidean data). 
Consequently, the (potential) relevance of a query point to the cluster can be estimated by relating its average 
distance to the cluster points to that of the center or more generally, to the distribution of the average distance 
of each cluster point to all others. This classification method has the advantages of being non-parametric 
(making no distribution assumptions on the data), similarly to the popular k nearest neighbors lITOl (kNN) 
classification. Average distance based classification complements kNN, in that it targets settings where the 
outliers in the labeled points do carry information that should be incorporated in the classifier. A recent study 
lIT^ demonstrated that this is the case for some data sets in the UCI repository, where average distance based 
classification is much more accurate than kNN classification. 

These notions of centrality and average distance had been extensively used in the analysis of social 
networks and metric data sets. We aim here to provide better tools to facilitate the computation of these 
measures on very large data sets. In particular, we present estimators with tight statistical guarantees whose 
computation is highly scalable. 

We consider inputs that are either in the form of an undirected graph (with nonnegative edge weights) or 
a set of points in a metric space. In case of graphs, distance of the underlying metric correspond to lengths 
of shortest paths. Our results also extend to inputs specified as directed strongly connected graphs where the 
distance are the round trip distances We use a unified notation where V is the set of nodes if the input is 
a graph, or the set of points in a metric space. We denote |E| = n. We use graph terminology, and mention 
metric spaces only when there is a difference between the two applications. We find it convenient to work 
with the sum of distances 

W('(;) = dist(u, u) . 

u&V 

Average distance is then simply W(r;)/n and centrality is CC(r;) = (n — 1)/W(u). Moreover, estimates 
W(r;) that are within a small relative error, that is (1 — e) W(m) < W(m) < (1 + e) W(ri), imply a small 
relative error on the average distance, by taking W(r;)/n, and for centrality CC{v), by taking CC{v) = 
{n — 1)/W(r;). 

We list the fundamental computational problems related to these measures. 

• All-nodes sums'. Compute W(u) of all v £V. 

• Point queries (metric space): Preprocess a set of points E in a metric space, such that given a query 
point V (any point in the metric space, not necessarily v G V), we can quickly compute W(r;). 

• l-mediam Compute the node u of maximum centrality or equivalently, minimum W (u). 

• All-pairs sum: Compute the sum of the distances between all pairs, that is APS(E) = ^ Ylvev W(u). 

In metric spaces, we seek algorithms that compute distances for a small number of pairs of points. In 
graphs, a distance computation between a specific pair of nodes u, v seems to be computationally equiva¬ 
lent in the worst-case to computing all distances from a single source node (one of the nodes) to all other 
nodes. Therefore, we seek algorithms that perform a small number of single-source shortest paths (SSSP) 
computations. An SSSP computation in a graph can be performed using Dijkstra’s algorithm in time that 
is nearly linear in the number of edges ifT^ . To support parallel computation, it is also desirable to reduce 
dependencies between the distance or single-source distance computations. 

The best known exact algorithms for the problems that we listed above do not scale well. To compute 
W(r;) for all v, all-pairs sum, and 1-median, we need to compute the distances between all pairs of nodes, 
which in graphs is equivalent to an all-pairs shortest paths (APSP) computation. To answer point queries. 
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we need to compute the distances from the query point to all points in V. In graphs, the hardness of some 
of these problems was formalized by the notion of subcubic equivalence 1231. Abboud et al [Ij showed 
that exact 1-median is subcubic equivalent to APSP and therefore is unlikely to have a near linear time 
solution. We apply a similar technique and show (in Section |7ll that the all-pairs sum problem is also 
subcubic equivalent to APSP. In general metric spaces, exact all pairs sum or 1-median clearly requires 
ff (n^) distance computations^ 

Since exact computation does not scale to very large data sets, work in the area focused on approxi¬ 
mations with small relative errors. We measure approximation quality by the normalized root mean square 
error (NRMSE), which is the square root of the expected (over randomization used in the algorithm) square 
difference between the estimate and the actual value, divided by the mean. When the estimator is unbiased 
(as with sample average), this is the ratio between the standard deviation and the mean, which is called the 
coefficient of variation (CV). Chebyshev’s inequality implies that the probability that the estimator is within 
a relative error of r/ from its mean is at least 1 — (Cl^)^/(r/)^. Therefore a CV of e implies that the estimator 
is within a relative error of 77 = ce from its mean with probability > 1 - l/c 2 . 

The sampling based estimates that we consider are also well concentrated, meaning roughly that the 
probability of a larger error decreases exponentially with sample size. With concentration, by increasing the 
sample size by a factor of 0 (log n) we get that the probability that the relative error exceeds e, for any one 
of polynomially many queries, is polynomially small. In particular, we can estimate the sum of the distances 
of the 1 -median from all other nodes up to a relative error of e with a polynomially small error probability. 

Previous work 

We review previous work on scalable approximation of 1-median, all-nodes sums, and all-pairs sum. These 
problems were studied in metric spaces and graphs. A natural approach to approximate the centrality of 
nodes is to take a uniform sample S of nodes, perform IS"! single source distance computations to determine 
all distances from every n G 5 to every u G V, and then estimate W(n) by W(ti) = W 5 (r;), where 

W5(^) = dist( 7 ;, a) is the sum of the distances from v to the nodes of S. This approach was used 

by Indyk iflSl to compute a (1 -|- e)-approximate 1 -median in a metric space using only 0{e~^n) distance 
computations (See also iflTll for a similar result with a weaker bound.). We discuss this uniform sampling 
approach in more detail in Section!^ where for completeness, we show how it can be applied to the all-nodes 
sums problem. 

The sample average of a uniform sample was also used to estimate all-nodes centrality IfTTII (albeit with 
weaker, additive guarantees) and to experimentally identify the (approximate) top k centralities Ifl9l . When 
the distance distribution is heavy-tailed, however, the sample average as an estimate of the true average can 
have a large relative error. This is because the sample may miss out on the few far nodes that dominate 
W{v). 

Recently, Cohen et al [®1 obtained e NRMSE estimates for W {v) for any v, using single-source distance 
computations from each node in a uniform sample of nodes. Estimates that are within a relative error 
of e for all nodes were obtained using log n single-source computations. This approach applies in any 
metric space. The estimator for a point v is obtained by using the average of the distances from to a 
uniform sample for nodes which are “close” to v and estimating distances to nodes “far” from v by their 

'Take a symmetric distance matrix with all entries in (1 — 1/n, 1]. To determine the 1-median we need to compute the exact 
sum of entries in each raw, that is, to exactly evaluate all entries in the raw. This is because an unread entry of 0 in any raw would 
determine the 1-median. Similarly, to compute the exact sum of distances we need to evaluate all entries. Deterministically, this 
amounts to ( 2 ) distance computations. 
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distance to the sampled node closest to v. The resulting estimate is biased, but obtains small relative errors 
using essentially the information of single-source distances from a uniform sample. 

For the all-pairs sum problem in metric spaces, Indyk ifTTl showed that it can be estimated by scaling 
up the average of 0(ne~^'^) distances between pairs of points selected uniformly at random. The estimate 
has a relative error of at most e with constant probability. Barhum, Goldreich, and Shraibman (21 improved 
Indyk’s bound and showed that a uniform sample of 0{ne~^) distances suffices and also argued that this 
sample size is necessary (with uniform sampling). Barhum et al. also showed that in an Euclidean space a 
similar approximation can be obtained by projecting the points onto 0 (l/e^) random directions and aver¬ 
aging the distances between all pairwise projections. Goldreich and Ron ifTSl showed that in an unweighted 
graph 0{e~‘^^/n) distances between random pairs of points suffice to estimate the sum of all pairwise dis¬ 
tances, within a relative error of e, with constant probability. They also showed that 0{e~^^/n) distances 
from a fixed node s fo random nodes v suffice fo esfimafe W (v), wifhin a relafive error of e, wifh consfanf 
probabilify. A difficulfy wifh using fhis resulf, however, is fhaf in graphs if is expensive fo compufe disfances 
befween random pairs of poinfs in a scalable way: typically a single disfance befween a parficular pair of 
nodes s and t is nof easier fo obfain fhan a complefe single source shorfesf pafh free from s. 

Contributions and overview 

Our design is based on compufing a single weighted sample fhaf provides esfimafes wifh sfafisfical guaran¬ 
tees for all nodes/points. A sample of size 0(e“^) suffices fo obfain estimates W(z) with a CV of e for any 
z. A sample of size 0{e~^ logn) suffices for ensuring a relafive error of af mosf e for all nodes in a graph 
or for polynomially many queries in a mefric space, wifh probability fhaf is al leasl 1 — l/poly{n). 

The sampling algorifhm is provided in Seclion|2] This algorifhm computes a coefficient 7 ^ for each v G 
V such fhaf 7 ^, = 0(1). Then for a parameter k, we obfain sampling probabilifies pu = min{l, 
for u ^ V. Using fhe probabilifies p^, we can obfain a Poisson sample S of expected size YluP^ — 0{k) 
or a VarOpf sample [ 8 ] fhaf has exacfly fhaf size (rounded fo an infeger). 

We presenf our esfimafors in Seclion[3] For each node u, fhe inverse probabilify eslimalor dist(z, u) is 
equal fo dist( 2 :, u) jpu if u is sampled and is 0 ofherwise. Our esfimafe of fhe sum W( 2 :) is fhe sum of Ihese 
esfimafes 

^ dist(z, u) = '^ dist( 2 ;, u) = '^ • (2) 

uGV u£S u£S 

Since p^ > 0 for all u, fhe esfimafes dist( 2 ;, u) and hence fhe estimate W(z) are unbiased. 

We provide a delailed analysis in Section |4l We will show fhaf our sampling probabilifies provide fhe 
following guarantees. When choosing k = 0{e~^), W(z) has CV e. Moreover, fhe estimates have good 
concenfrafion, so using a larger sample size of 0 (e“^ log n) we obfain fhaf fhe relafive error is af mosf e for 
all nodes v £ V wifh probability al leasl 1 — l/poly{n). 

In order fo obfain a sample wifh such guaranfees for some particular node z, fhe sampling probabilify of 
a node v should be (roughly) proporlional fo ils disfance dist(z, v) from z. Such a Probabilify Proportional 
fo Size (PPS) sample of size k = uses coefficienfs 7.0 = dist(u, z)/ W (z) and has CV of e. We will work 
wifh approximate PPS coefficienfs, which we define as salisfying 7 ^ > c dist(n, z)/ W (z) for some consfanf 
c. Wifh approximate PPS we obfain a CV of e wifh a sample of size 0(e“^). If is far from clear apriori, 
however, fhaf Ihere is a single sel of universal PPS coefficienfs which are simullaneously (approximale) PPS 
for all nodes and are of size 7 ^, = 0(1). Thai is, a single sample of size 0(e“^), which is independenf 
of n and of fhe dimension of fhe space, would work for all nodes. 
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Beyond establishing the existence of universal PPS coefficients, we are interested in obtaining them, 
and the sample itself, using a near-linear computation. The dominant component of the computation of 
the sampling coefficients is performing 0(1) single-source distance computations. Therefore, it requires 
0{m log n) time in graphs and 0(n) pairwise distance queries in a metric space. A universal PPS sample of 
any given size k can then be computed in a single pass over the coefficients vector 7 {0{n) computation). We 
represent the sample S' as a collection {{u,pu)} of nodes/points and their respective sampling probabilities. 
We can then use our sample for estimation using 

When the input is a graph, we compute single-source distances from each node in S to all other nodes 
in order to estimate W(n) of all v ^ V. This requires 0{\S\mlogn) time and 0{n) space. 

Theorem 1.1. All-nodes sums (W {v)for all v ^V) can be estimated unbiasedly as follows: 

• With CV e, using 0(e“^) single source distance computations. 

• When using 0(e“^logn) single source distance computations, the probability that the maximum 
relative error, over the n nodes, exceeds e is polynomially small. 


Pr 


max 

z^V 


W(z) - W(z)| 
W(z) 


> e 


< l/poly{n) . 


In a metric space, we can estimate W(x) for an arbitrary query point x, which is not necessarily a 
member of V, by computing the distances dist(a:,n) for all E S and applying the estimator ([21). Thus, 
point queries in a metric space require 0(n) distance computations for preprocessing and 0 (e“^) distance 
computations per query. 


Theorem 1.2. We can preprocess a set of points V in a metric space using 0(n) time and 0{n) distance 
computations (0(1) single source distance computations) to generate a weighted sample S of a desired size 
k. From the sample, we can unbiasedly estimate W {z) using the distances between z and the points in S 
with the following guarantees: 

• When k = 0{e~‘^), for any point query z, W{z) has CV at most e. 

• When k = 0{e~^ logn), the probability that the relative error ofW^z) exceeds efor is polynomially 
small: 

r iw(z)-w(z)i 

W(z) 


< l/poly{n) . 


We can also estimate all-pairs sum, using either primitive of single-source distances (for graphs) or 
distance computations (metric spaces). 

Theorem 1.3. All-pairs sum can be estimated unbiasedly with the following statistical guarantees: 

• CV of at most e, using 0(e“^) single-source distance computations. With a relative error that exceeds 
e with a polynomially small probability, using 0{e~^ logn) single-source distance computations. 

• With CV of at most e, using 0{n -|- e“^) distance computations. With a relative error that exceeds e 
with polynomially small probability, 

< l/poly{n 

using 0 (n -|- log n) distance computations. 


Pr 


|aPS(1/) - APS(1/)| 
APS(V) 


> e 
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The proof details are provided in Section HI The part of the claim that uses single-source distance 
computations is established by using the estimate APS(y) = W(z). When the estimates have 

CV of at most e, even if correlated, so does the estimate APS(F)o For the high probability claim, we use 
O(logn) single-source computations to ensure we obtain universal PPS coefficients with high probability 
(details are provided later), which imply that each estimate W{z), and hence the sum is concentrated. 

For the second part that uses distance computations, we consider an approximate PPS distribution that is 
with respect to dist(u, v), that is, the probability of sampling the pair (rt, v) is at least cdist(rt, v)/ APS(y) 
for some constant c. We show that we can compactly represent this distribution as the outer product of two 
probability vectors of size n. Using this representation we can draw 0{e~^) pairs independently in linear 
time, which we use for estimating the average. 

Compared to the all-nodes sums algorithms of || 6 l, our result here improves the dependency in e from 
to (which is likely to be optimal for a sampling based approach), provides an unbiased estimates, and 
also facilitates approximate average distance oracles with very small storage in metric spaces (the approach 
of im would require the oracle to store a histogram of distances from each of sampled nodes). For the all¬ 
pairs sum problem in graphs, we obtain an algorithm that uses 0 (e“^) single source distance computations, 
which improves over an algorithm that does 0{e~^) single source distance computations implied by |(6i|. For 
the all pairs sum problem in a metric space, we obtain a CV of e using 0{n + e“^) distance computation 
rather than 0{ne~‘^) distance computations required by the algorithms in i^ITTl. 

While our analysis does not optimize constants, our algorithms are very simple and we expect them to 
be effective in applications. 

2 Constructing the sample 

We present Algorithm [T] that computes a set of sampling probabilities associated with the nodes of an input 
graph G. We use graph terminology but the algorithm applies both in graphs and in metric spaces. The 
input to the algorithm is a set So of base nodes and a parameter k (we discuss how to choose Sq and k 
below). The algorithm consists of the following stages. We first compute a sampling coefficient 7 ^ for each 
node V such that = 0 ( 1 )- Then we use the parameter k and compute the sampling probabilities 

Pv = min{l, kjy}. Finally we use the probabilities py to draw a sample of expected size 0{k), by choosing 
V with probability py. We usually apply the algorithm once with a pre-specified k to obtain a sample, but 
there are applications (see discussion in Section (SA]) in which we want to choose the sample size adaptively 
using the same coefficients. 

Running time and sample size The running time of this algorithm on a metric space is dominated by 
ISoln distance computations. On a graph, the running time is |S'o|mlogn, and is dominated by the |S'o| 
single-source shortest-paths computations. The expected size of the final sample S is Py < k jy = 
0{k). 

Choosing the base set Sq We will show that in order to obtain the property that each estimate W(u) has 
CV 0(e), it suffices that the base set Sq includes a uniform sample of > 2 nodes and we need to choose 
k = e~^. Note that the CV is computed over the randomization in the choice of nodes to Sq and of the 
sample we choose using the computed coeffcients. We will also introduce a notion of a well positioned 

^In general if random variables X and Y have CV e then so does their sum: = Var(x) +Var(Y)^Co\i{x , y) ^ 

_ + ^)) + 

Var(X)+Var(V) + 2.^Var(X) Var(V) ^ (E{X)p+ t'^ (E(Y)p+2t'^ E{X)E(Y) ^ 2 

{E{X+Y)P - (B(X+V))2 ^ ^ ■ 
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Algorithm 1 Compute universal PPS coefficients and sample 

Input: Undirected graph with vertex set U or a set of points U in a metric space, base nodes 5o, parameter 

k 

Output: A universal PPS sample S 
// Compute sampling coefficients 7 ^ 

foreach node v do 

|_ 7^ ^ 1/n 

foreach rt G S'o do 

Compute shortest path distances dist(u, v) from u to all other nodes n G U 
^ dist(n,n) 

foreach node n G U do 

|_ 7^, ^ max{7„, 

foreach nofife n G U do // Compute sampling probabilities py 

\_ Pv ^ min{l, A:7„} 

(S' ■(— 0 // Initialize sample 

foreach n G U do // Poisson sample according to py 
if rand{) < py then 
1_ 5 ^ 5U {(n,p^)} 

return S 


node, which we precisely define in the sequel. We will see that when So includes such a node, we also have 
CV of 0(e) with k = e“^. This time using only the randomization in the selection of the sample. Moreover, 
if we choose k = e“^ log n and ensure that .So contains a well-positioned node with probability at least 
1 — l/poly{n) then we obtain that the probability that the relative error exceeds e is polynomially small. 
We will see that most nodes are well positioned, and therefore, it is relatively simple to identify such a node 
with high probability. 

3 Estimation 

3.1 Centrality values for all nodes in a graph 

For graphs, we compute estimates W(n) for all nodes n G U as in Algorithm |2] We initialize all estimates 
to 0, and perform a SSSP computation from each node in rt G .S. When scanning node v, during such 
SSSP computation, we add dist(rt, v)/pu to the estimate W (n). The algorithms runs in 0(|.S'|mlog n) time, 
dominated by the |.S'| SSSP computations from each node in the sample S. 

3.2 Point queries (metric space) 

For a query point z (which is not necessarily a member of V), we compute the distance dist(z, x) for all 
X £ S, and apply This takes |.S'| distance computations for each query. 
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Algorithm 2 Compute estimates W(ti) for all nodes v in the graph 
Input: Weighted graph G, a sample S = {{u,pu)} 

foreach n G F do 
|_ W(n) ^ 0 
foreach u G S' do 

Perform a single-source shortest-paths computation from u. 
foreach scanned node n G F do 
|_ W(n) •(— W {v) + dist(rt, v) jpu 

return (n, W {v))for v 


4 Correctness 


We first show (Section l4~T]) show that when k = e“^, and Sq includes either a uniform sample of size at least 
2 then each estimate W(n) has CV of 0(e). We then define well-positioned nodes in Sectionand show 
that if Sq contains a well positioned node we and sample size is fc = then the CV is 0(e) fSection 
and when k = 0{e~^ log n), the probability that the relative error exceeds e is polynomially small (Section 

EB- 

In Section |4~4| we establish an interesting property of our sampling coefficients: They can not grow too 
much even if the base set Sq is very large. Clearly, < 1 + |<S'o|, but we will show that it is 0(1) 

regardless of the size of Sq. 

We start with some useful lemmas. 


Lemma 4.1. Suppose that Sq contains a node u. Consider a node z such that u is the [qnY^ closest node 
to z. Then for all nodes v, 


7v > 


1 — q dist(2;,n) 


4 W( 2 ) 

Proof. From the specification of Algorithm [H the sampling coefficients 7 ^, satisfy 

1 dist(n,n) 


7 ^ > max 


n 


W(n) 


(3) 


(4) 


Let Q = dist(2;, u). Consider a classification of the nodes n G V to “close” nodes and “far” nodes according 
to their distance from z: 


L = {v \ dist(2;,n) < 2Q} 
H = {n G V I dist(z,r;) > 2Q} . 


Since 7 ^ > 1/n, for n G L we have 

> - > f ^ ^ \ 2- ^ / 1 - dist(z,n) 

- n- \ 2 )\l-q) n V 2 2 ) W(z) 


(5) 


where the last inequality holds since for n G L we have dist( 2 ;, v) < 2Q, and since W(z) > (1 — q) nQ if 
u is the (qn)th closest node to z. 















For all V, we have that dist(n, u) > dist(z, u) — Q by the triangle inequality. We also have W(?x) < 
W {z) + nQ. Substituting into dUl we get that for every v 


distfti, v) dist(z, v) — Q 

Iv > --- - > 


W(ri) 


In particular, for v £ H, we have 


W(z) + nQ 


1 


dist( 2 ;, v) — Q > - dist( 2 ;, v) . 


As already mentioned, we also have W( 2 ;) > (1 — q) nQ and thus 

W{z) 


and 


nQ < 

W{z)+nQ < W(z) ( 1 + 


1-q 

1 


= W{z) 


2 -g 

l-q 


Substituting dUl and (|7]l in ®, we obtain that for v £ H, 


dist( 2 ;,r;) — Q 1 f 1 — q\ dist(z,r;) 
7v > -—-— > 


W(z)+nQ -2\2-qJ W(z) 

The lemma now follows from (|5]l and (fTOl ). 


( 6 ) 

(V) 

( 8 ) 

(9) 

( 10 ) 

□ 


Lemma 4.2. Consider a set of sampling coefficients 7 ^ such that for a node z,for all v and for some c > 0, 
Iv > S be a sample obtained with probabilities py = min{l, k'jy} (as in AlgorithmU}, and 

let W (z) be the inverse probability estimator as in Then 

Var[W(z)] < . (11) 

k ■ c 

Proof The variance of our estimator is 


Var[W(z)] = ^ 


Pv 


^ ^ ^ — dist(z, u)^ + (1 — p^) dist(z, u)^ 


( 


Pv 




( 12 ) 


Note that nodes v for which py = 1 contribute 0 to the variance. For the other nodes we use the lower 
bound pt, > ' 




v^V 


,vr = (-l)dist(2;,^ 

^ ^ Pv ) 


< 


< 


v£V\p^<l 
^{z) 
k ■ c 


yy dist( 2 ;, ^ 


vGV 


W(z)2 

k ■ c 


□ 
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4.1 Base set containing a uniform sample 

We now consider a situation where Sq includes a uniform sample of nodes, and consider the corresponding 
expected approximation quality: 


Lemma 4.3. Suppose that Sq contains a uniform random sample of b nodes. Then for any node z, 

W(z)2 Ah 


Var[W( 2 ;)] < 


k 6-1 


(13) 


Proof We apply Lemma |4^ with the bound on the coefficients as in Lemma ILT] with u being the closest 
node to z in Sq. Assume that u is the xth closest node to z. By Lemma Iddl and Lemma |4^ we have 


Var[W(z) I x] < 


W(z)^ 


k 1 — x/n 


(14) 


Observe that x is a random variable which is the rank (= position in the sorted order of the nodes by 
distance from z) of the closest node to z in a uniform sample of size 6. In particular x take values G 
{1, 2,... , re — 6 + 1} (x = 1 iff ri = z). We have that the probability of rank x is 


6 



re — X 


re — 1 


re — X — 
re -2 ) 


/re — X — 6 + 2\ 
V re — 6 + 1 ) 


< b 



(We choose the random subset of Sq of 6 nodes without replacement, we split into 6 events according to the 
step in which the node of rank x is chosen. Other items should be chosen from the re — x nodes of rank larger 
than X. ) The variance Var[W(z)] is the expectation, over x G {1, 2,... , re — 6 + 1}, of Var[W(z) | x]. So 
from (fTdl) . we get 


n—6+1 

Var[W(z)] < 


X=1 


X 


re 


6-1 


W(z) 


k (1 — x/n) 


\ 71 ) 


< 


k 

W(z)2 


x=\ 


n) 


46 


k 

W(z)2 46 
k 6-1 


Jo 


□ 


It follows from Lemma 1431 that if we choose 6 > 2 nodes uniformly into Sq and k = e“^, then for any 
node z, our estimator has Var[W(z)] = 0(e^ W(z)^). This concludes the proof of the per-node (per-point) 
0(e) bound on the CV of the estimator in the first part of Theorems 11.11 and 1 1.2 1 for a sample of size 0(e“^). 


4.2 Well-positioned nodes 

We provide a precise definition of a well positioned node. Let the median distance of a node re, denote by 
m{u), be the distance between re and the [1 + re/2] closest node to re in V. Let MinMed = min^igy rre(re) 
be the minimum median distance of any node re G L. In a metric space, we can define m{u) for any poinf re 
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in the space (also for u 0 V), and accordingly, define MinMed as the minimum m{u) over all points u in 
the metric space. 

We say that a node u is well positioned if m{u) < 2 MinMed, that is, m{u), the median distance of u 
is within a factor of 2 of the minimum median distance. We now show that most nodes are well positioned. 

Lemma 4.4. Let v be such that is m{v) = MinMed. Then all [1 + n/2] nodes in V that are closest to v 
are well positioned. 

Proof. Let u be one of the [1 + n/2] nodes closest to v. Then dist(n, v) < MinMed and a ball of radius 
2 MinMed around u contains all the [1 + n/2] nodes closest to v. So m{u) < 2 MinMed. □ 

We are interested in well positioned nodes because of the following property: 

Lemma 4.5. Ifu is a well positioned node, then for every node z we have that dist(z, u) < 3m{z). 

Proof. For every two nodes u and z we have that dist(n, z) < m(u) + m(z) since there must be at least 
one node x that is both within distance m(u) from u and within distance m{z) from 2 , and by the triangle 
inequality dist(n, z) < dist(n, x) + dist(x, 2 ;). The lemma follows since if u is well positioned then 
m(n) < 2m{z). □ 

As we shall see, this means that sampling probabilities proportional to the distances from a well posi¬ 
tioned node u approximate sampling probabilities proportional to the distances from any other node z, for 
nodes whose distance from z is substantially larger than m{z). 


4.3 Base set with a well-positioned node 


We now consider the case where Sq contains a well-positioned node. We show that in this case the coeffi¬ 
cients satisfy what we call a universal PPS property: 


Lemma 4.6. Suppose that Sq contains a well-positioned node u. Then for all nodes v, 


Iv 


1 

> — max 
“ 18 ^ 


dist( 2 ;, v) 
W{z) 


(15) 


Proof. We show that for any node z, 7 ^ > ^ ^ using a variation of the proof of Lemma l4d] 

We partition the nodes into two sets. A set L which contains the nodes v such that dist( 2 :, v) < 
6m{z) and a set H which contains the remaining nodes. By the definition of m{z) we have that W(^;) > 
m(z)([^J — 1) > m{z)^ (for n > 9). We obtain that for all x E L, 


dist(t:,z) ^ 6m{z) 18 

W(z) “ m{z)^ n 


Therefore, 


11 dist(r;, 2 ;) 

■7^1 ^ ^-. 

n “ 18 W(z) 

We next consider v £ H. Since u is well positioned, by Lemma 1431 we have that dist(z, u) < 3m{z). From 
the triangle inequality, dist(n, x) > dist(z,r;) — dist(z, n) > dist(. 2 ;, x) — 3m{z) > dist(z,r;)/2. We also 
have W(u) < W( 2 ) -|-ndist( 2 :, tt) < ^{z) ■P3nm{z) < 9W(z). Therefore 


^ dist(n, x) ^ (dist( 2 :, f)/ 2 ) _ 1 dist( 2 ;,i;) 
“ W(ri) - 9W(z) “ 18 W(z) 


□ 
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As a corollary, applying Lemma 14^ we obtain: 


Corollary 4.7, If Sq contains a well-positioned node, then for any node z, Var[W( 2 ;)] < 18 


W(z)^ 

k 


4.4 Upper bound on the sum of the coefficients 

One consequence of Lemma 14.61 is that the coefficients 7 ^ cannot grow too much even if the base set So 
includes all nodes. 


Corollary 4.8. Let 


Then 


Iv 


max 

Z 


dist(z, v) 

W(2) 




Proof Consider the case where Sq consists of a single well positioned node. By the definition of we have 
that Y.V ^ 2. By Lemma[T5]we have ^ Therefore Yu^ % < 18 Yv Iv < 36. □ 


4.5 High probability estimates 

Lastly, we establish concentration of the estimates, which will conclude the proof of the very high probability 
claims in Theorem 11.11 and [L2l 
We need the following lemma: 

Lemma 4.9. If our sampling coefficients are approximate PPS for a node z, that is, there is a constant c 
such that for all nodes v, 7 ^ > and we use k = 0{e~^ logn), then 


Pr 


|W(z) - W(z)| 
W(z) 


0{l/poly{n)) . 


Proof We apply the Chemoff-Hoeffding bound. Let t = W{z)l (ck). We have 


Py > min{l, dist(z, u)/t} = min{l,cA:dist(z,r;)/W( 2 ;)} . (16) 

The contribution of a node v to the estimate W(z) is as follows. If dist(z, v) > r, then the contribution 
is exactly dist( 2 ;, v). Otherwise, the contribution of node v is dist(z, v)/py < r with probability and 
0 otherwise. 

The contributions Xy of the nodes with dist(z,r;) < r are thus independent random variables, each 
in the range [0, r] with expectation dist(z, v). We complete the proof by applying the Chemoff-Hoeffding 
bound to bound the deviation of expectation of the sum of these random variables. We defer the details to 
the full version of the paper. □ 

We need the condition of Lemma |4~9| to hold for all nodes 2; with probability 1 — 0{l/poly{n)). 
Equivalently, we would like 7 to be universal PPS with very high probability. If so, we apply a union 
bound to obtain that the estimates W {z) for all nodes 2; have a relative error of at most e with probability 
1 — 0{l/poly{n)). The same argument applies to polynomially many queries in metric spaces. 
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It follows from Lemma 143] that we obtain the universal PPS property if Sq includes a well positioned 
node. We would like this to happen with very high probability. We mention several ways to achieve this ef¬ 
fect: (i) Since most nodes are well positioned (Lemma l4~4l) . taking a uniform random sample U of O(logn) 
nodes, and choosing the node u = argmin^g^/ m{u) with minimum distance to its \n/2 + 1 ] closest node, 
means that we are guaranteed with probability 1 — l/poly{n) that u is well positioned. This identifica¬ 
tion step involves O(logn) single-source distance computations, (ii) Alternatively, we can ensure that So 
contains a well positioned node (with a polynomially small error) by simply placing O(logn) uniformly 
selected nodes in Sq. The computation of the coefficients will then require O(logn) single-source distance 
computations, (iii) Lastly, if Sq contains O(logre) uniformly selected nodes then we can apply a direct 
argument that with a polynomially small error for each node 2 ;, one of the \n/2 + l\ closest nodes to z is in 
So- This means we can apply Lemma |4~T] with q < 0.5 to obtain that with a polynomially small error, the 
sampling probabilities are approximate PPS for all nodes and thus universal PPS with a polynomially small 
error. 

To establish the second part of Theorem lL2l in metric spaces, we would like to identify a well positioned 
node with a polynomially small {0{\/poly{n))) error using only 0{n) distance computations, which is 
more efficiently than by using O(logn) single-source distance computations. 

To do so, we first provide a slightly relaxed definition of well positioned node and show that it retains 
the important properties. We will then show that a “relaxed” well positioned node can be identified wifh 
very high probabilify using only 0(log^ n) disfance compufafions. When we idenlify such a node, we can 
use if in fhe base sef So- This means we can use 0{n) disfance compufafions in fofal fo compufe coefficienfs 
7 which are universal PPS wifh a polynomially small error. We fhen use 0{n) lime lo compufe a sample of 
size k = 0{e~^ logn), and use fhis sample lo process poinl queries. 

Whal remains is lo inlroduce fhe relaxed definifion of a well-posilioned node and show lhal if has fhe 
claimed properties. 

4.6 Relaxed well positioned points 

For Q > [l-|-n/2],we define fhe Q-quanlile disfance rnQ{v) for a poinl v as fhe disfance of fhe Qth closesl 
poinl fo V- We fhen define MinMeDq as fhe minimum Q-quanfile disfance over all poinls. Now, we define 
a poinl u fo be Q well positioned if m^i_^n /2 \ (^) < 2 MinMeDq. 

Now observe lhal al least half the points have mQ{v) < 2MinMeDq and in particular are well po¬ 
sitioned (extension of Lemma 14.41) . Also observe that if z is Q well positioned then for any node u, 
dist(z,tt) < 3mQ{u) (extension of Lemma 1431) . We can also verify that for any Q < 0.6n (any con¬ 
stant strictly smaller than 1 would do), a base set So containing one Q well positioned point would also 
yield coefficients that satisfy the universal PPS property, albeit with a slightly larger constant. 

We next show that we can identify a 0.6n well positioned point within a polynomially small error using 
very few distance computations: 

Lemma 4.10. We can identify a 0.6n well positioned point with probability 1 — 0{l/poly{n)) using 
0(log^ n) distance computations- 

Proof We choose uniformly at random a set of points C of size O(logn). For each point in u € C, we 
choose a uniform sample Sy of 0(log n) points and compute the 0.55 quantile of {dist(u, u) | u G Sy}- We 
then return the point v £ C with the minimum sample 0.55 quantile. 

We refer to C as the set of candidates. Note that since at least half the points u G L are such that 
'tno.oniv) < 2 MlNMEDo, 6 n> the set C contains such a point with probability 1 — 0{l/poly{n))- 
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The estimates are such that with probability 1 — 0{l/poly{n)), for all points in C, the sample 0.55 
quantile is between the actual 0.5 and 0.6 quantiles. Therefore the point we returned (with a polynomially 
small error) has mo.sn at most the smallest mo.en in C, which is at most 2 MlNMEDo.en- □ 


5 All-pairs sum 


We now establish the claims of Theorem 1 1.3 1 for the all-pairs sum problem. We start with the first part of the 
claim, which is useful for graphs, estimates APS(F) using single-source computations. To do so, we apply 
Algorithm [T] to compute sampling coefficients 7 and then apply Algorithm |2] to compute estimates W(n) for 
all V. Finally, we return the estimate APS(y) = ^ W (z). 

To obtain an estimate APS(y) with CV of at most e, we choose a base set Sq that contains 2 uniformly 
sampled nodes when applying Algorithm [T] We then use sample size of 0(e“^) to ensure that the per-node 
estimates W(z) have CV of at most e. Note that the estimates of different nodes are correlated, as they all 
use the same sample, but the CV of the sum of estimates each with CV of at most e must be at most e. The 
total time amounts to 0{e~^) single-source distance computations. 

To obtain universal PPS with polynomially small error we can identify a well positioned node with a 
polynomially small error, which can be done using O(logre) single-source computations. We then compute 
the sampling coefficients 7 for a base set that contains this well-positioned node. (Which uses a single¬ 
source distance computation). The sampling coefficients we obtain have the universal PPS property and the 
sample-based estimates are concentrated. A sample size of size 0(e“^ logn) would yield a relative error of 
at most e with probability 1 — l/poly{n), for each W(z) and thus for the sum APS(V). In total, we used 
0 (e“^ logn) single-source computations. 

The remaining part of this section treats the second part of the claim of Theorem 11.31 which applies to 
the all-pairs sum problem in metric spaces. We start with an overview of our approach. In order to obtain a 
good sample of pairs, we would like to sample pairs proportionally to pij = The obvious difficulty 

we have to overcome is that the explicit computation of the probabilities pij requires a quadratic number of 
distance calculations. 

Our first key observation is that we can obtain a sample with (nearly) the same statistical guarantees if 
we relax a little the sampling probabilities and the sample size: For some constant c > 1, we work with 
probabilities that satisfy pij > and use a sample of size k = ce~^. 

We use independent sampling with replacement to compute a multiset S of pairs of points from V xV. 
The estimator we use is the sample average inverse probability estimator: 


APS(V) 


1 


dist{i,j)/pij . 

(*j)gs 


This sample average is an unbiased estimate of APS(V) and has CV of at most y/fc/c which is e when we 
use sample size k = ce~^. Moreover, each summand is by definition at most cAPS(V) and therefore we 
obtain concentration by a direct application of Hoeffding’s inequality: The probability of a relative error 
that is larger than e when the sample size is k is at most . In particular, if we take a sample size 

that is 0 (e“^ log re), we obtain that the probability that the relative error exceeds e is polynomially small in 


We next discuss how we facilitate such sampling efficiently. We would like to be able to sample with 
respect to relaxed pij and also have the sampling probabilities available for estimation. We show that we 
can express a set of relaxed probabilities (for some constant c) as the outer product of two probability 
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distributions over points, 'yp^ . The distribution 7 has the universal PPS property with respeet to some 
constant d. The probability distribution p has the property that for some constant c", for all v, > 
observe that for some constant c = c'c", for all pairs u, v, pu'Jv > 
we can sample according to p^v = Pulv and satisfy the relaxed conditions and obtain the desired statistical 
guarantees. 

What remains is to provide details on (i) how we use the vectors 7 and p to obtain a sample of pairs and 
(ii) how we compute such vectors that satisfy our conditions within a polynomially small error. These are 
addressed in the next two subsections. 

5.1 Sampling pairs using the coefficient vectors 

We show how we obtain k samples {v, u) from ^vPu efficiently, using computation that is 0{n + k). Many 
sampling schemes (with or without replacement) will have the concentration properties we seek and the 
implementations are fairly standard. For completeness, we describe a scheme that computes independent 
samples with replacement. Our scheme obtains a sample from V xV hy sampling independently a point i 
according to the probability distributions 7 and a point j according to distribution p and returning (i, j). 

What remains is to describe how we can obtain k independent samples with replacement from a proba¬ 
bility vector 7 in time 0{n + k). 

We arbitrarily order the points, WLOG i is the ith point in the order. We compute a* = Ih 
and associate the intervals [a*, a* + 7 *] with the point i. 

To randomly draw a point i according to 7 , we can draw a random number x ~ f7[0,1] and take the 
point f E 1/ such that x E [oj, ai+^i). If we have k sorted random values, we can map all of them to points in 
V in 0(n) time using one pass on the sorted values and the sorted nodes. For completeness, we describe one 
way to obtain a sorted set of k independent random draws xi,..., Xfc ~ ?7[0, 1 ] using 0(k) operations: (i) 
We draw k values yi,.. .y^ where yi ~ Exp[k +1 — f] is exponentially distributed with parameters k + l — i. 
This can be done by drawing independent uniform Ui ~ C/[0,1] and take y^ = —ln{ui)/{k + 1 — i). (hi) 
Now observe that x[ = Vj ^ W\ ^ independent exponential random variables with parameter 
1 which are sorted in increasing order. We can then transform x' to uniform random variables Xj using 
Xj = 1 — exp(—x^). Since the transformation is monotone, we obtain that Xj are sorted. Note that prefix 
sums of yj and hence all Xj can be computed in 0{k) operations. Also nofe fhaf we only need precision fo 
fhe poinf needed fo identify fhe poinf fhaf each x* maps info. 

5.2 Computing the coefficient vectors 

We recall thaf universal PPS coefficienls can be computed using Algorilhm[T] using n disfance compufafions 
(and 0{n) addifional compufafion), when our base sef So confains a well positioned poinf. The probabilify 
vector 7 we work wifh is fhe universal PPS coefficienls scaled fo have a sum of 1. 

We nexl discuss how we obfain fhe probabilify dislribulion p. We show fhaf given a 0.6n well posifioned 
poinf (see Seclion 14.61) . we can compufe p^ fhaf has fhe claimed properfies wifh very high probabilify. From 
Lemma l4.10[ we can identify a poinf thaf is 0.6n well positioned wifh probabilify al leasl (1 — l/poly{n)), 
using only 0(log^ n) disfance compufafions. We use fhe following lemma, which a variation of claim used 
for fhe pivoting upper bound eslimale in |I 6 |. Whal if roughly says is fhaf for any node u and any node z 
fhaf is wifhin a consfanf times some quanfile disfance from u, we can gel a conslanl factor approximation of 
W(tt) from W( 2 :) and dist(tt, z). 

Lemma 5.1, Consider a point u and a point z such that dist(ri, z) is at most c times the distance of the 
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{qnY^ closest point to u. Then 


W{u) < n dist(tt, z) + W( 2 ;) < ( 1 + 


2 c 

1 - q 


W{u) 


Proof. Left hand side is immediate from the triangle inequality. To establish the right hand side, first note 
that (1 — q)n of the points are at least as far as dist(z, u) /c, thus W(u) > dist(ti, z). From triangle 

inequality we have W (^) < W(ri) + n dist(u, z). Combining we get: 

2 c 

W(z) + n distfu, z) < W (u) + 2n distlu, z) < (1 H-) W (u) . 

1 - q 


□ 


Now consider a point z that is 0.6n well positioned and using the rough estimates 

W'(ri) = n dist(ri, z) + W (z) 
for all points u and accordingly the sampling probabilities 

W'(i) 

Pi = -— • 

E, w'(j) 

By definition, for all points u, the point z satisfies dist(ti, z) < 3mQ,Qn{u). We therefore can apply the 
lemma with g = 0.6 and c = 3 and obtain that for all v, > i1Y+2c ^w\j) • ’^^at given 2 ;, the vector 

p can be computed for all points using n distance computations, from 2 ; to all other points. 

6 Uniform sampling based estimates 

For completeness, we briefly present another solution for the all-points/nodes problem that is based on uni¬ 
form sampling. The disadvantages over our weighted sampling approach is that it provides biased estimates 
and requires e~^ log n samples even when we are interested only in per-query guarantees. 

To do so, we use a key lemma proved by Indyk lH^fTTl . A proof of this lemma also appears in |[22l, and 
used to establish the correctness of his approximate 1 -median algorithm. 

Lemma 6.1. Let Q CV, |(5| = k sampled uniformly at random (from all subsets of size k). Let u and v be 
two vertices such that'^{v) > (1-|-e) W(tt). ThenPrif^ q{u) > Wq(c)) < 

Lemma (16.11) shows that if the average distance of two nodes differ by a factor larger than 1 -|- e, and 
we use a sample of size n(e“^) then the probability that the vertex of smaller average distance has larger 
average distance to the sample decays exponentially with the sample size. This lemma immediately implies 
that the 1 -median with respect to a sample of size Oifogn/e^) is (1 -|- e)-approximate 1 -median with high 
probability. 

To approximate all-pairs W(ti), we use a uniform sample of size 0(e“^logn) and order the nodes 
according to the average distance to the sample. Using the lemma, and comparing to the ideal sorted order 
by exact W (v), two nodes v, u that are transposed have with high probability W (v) and W (u) within 1 ± e 
from each other. 
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Recall however that the average distance to the uniform sample can be a very bad approximation of 
the average distance to the data set. We therefore perform adaptively another set of 0(e“^logn) single¬ 
source distance computations to compute exact W (v) of enough nodes in this nearly sorted order, so that 
the difference between exact W('i;) of consecutive processed nodes is within (1 ± e). 

We also mention here, for completeness, an improved approximate 1-median algorithm provided by 
Indyk. This algorithm only applies in metric spaces and computes a (1 -|- e)-approximate 1-median with 
constant probability using only 0(ne“^) distance computations (eliminating the logarithmic factor). The 
algorithm works in iterations, where in each iteration a fraction of the points, those with largest average 
distance to the current sample, are excluded from further considerations. The sample size is then increased 
by a constant factor, obtaining more accurate estimates for the remaining points. The final sample size used 
is linear, but the set of remaining nodes is very small. This algorithm only applies in metric spaces because, 
as we mentioned in the introduction, arbitrary distance computations are not efficient in graphs. Indyk’s 
approach can be extended to compute any approximate quantile of the distribution with similar probabilistic 
guarantees. 

7 Hardness of Computing Sum of All-Pairs Distances 

In this section we show that if there is a truly subcubic algorithm for computing APS(y), the exact sum of 
all pairs distances then there is a truly subcubic algorithm for computing All Pairs Shortest Paths (APSP). 

Williams and Williams |[2^ showed that APSP is subcubic equivalent to negative triangle detection. In 
the negative triangle detection problem we are given an undirected weighted graph G = {V,E) with integer 
weights in {—M,..., M} and the goal is to determine if the graph contains a negative triangle, that is, a 
triangle whose edge weights sum up to a negative number. Therefore to show that a subcubic algorithm 
for APS(y) implies a subcubic algorithm to APSP it suffices fo give a subcubic reducfion from the negative 
triangle detection problem to computing APS(y). We show this by the following lemma. 

Lemma 7.1. Given a 0{T{n, m)) time algorithm for computing the sum of all distances (APS(L)) there is 
0{T{n, m) + nf) time algorithm for detecting a negative triangle. 

Proof For an input instance G = {V, E) for the negative triangle detection problem we construct a graph 
G' = {V',E') for the sum of all distances problem. The vertex set V is the union of three copies of 
V, that is L' = Vi U V 2 U V 3 where vertex Ui £ Vi, i = 1, 2, 3, corresponds to vertex u £ V. We set 
E' = {(tt, u) \u,v £ V'}, that is G' is a complete graph. 

Let u{e) denote the length of an edge e £ E. Recall that a;(e) £ {—M, ...,M}. Let N = AM. 
We define the length cu'(e) of an edge e £ E' a.s, follows. For every {u,v) £ E we define io'{ui,V 2 ) = 
N + uj{u, v), = N + u}{u, v), and uj'{us, ui) = 2N — lo(u, v). We sef w{e) = 3N/2 for any 

other edge e £ E'. 

We claim that APS(L') = v)&E' v) if and only if G does not contain a negative triangle. In 

other words, we claim that either every edge in G' is a shortest path or G contains a negative cycle. 

To see the first direction, assume G contains a negative triangle (u, v), {u, x), (x, v). Now consider the 
path P = (u 3 , X 2 ), {x 2 ,vi) from to vi. Note that the length of this path is w'(u 3 , X 2 ) + oj'{x 2 ,vi) = 
N + oj{uz,X 2 ) + N + oj{x 2 ,vi) < 2N — co(u 3 , vi) = co'(u 3 , vi), where the strict inequality follows since 
(u, v), (u, x), (x, v) is a negative triangle. If follows that APS(L') < v)gE' v). 

To see the second direction, assume that APS(1/') < v)gE’ ^)- We need to show that G has a 
negative triangle. 
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We first claim that for every edge {u, v) which does not correspond to an edge in G (and hence w{e) = 
3N/2) we have co'{u, v) = distG'/(ri, v) (regardless if G has a negative triangle or not). To see this, note that 
io'{u, v) = 3N/2 = 6M and that every path from u to r; that consists of more than one edge is of weights 
at least 2N — 2M = 6M. The same argument also holds for every edge from Vi to V 2 and for every edge 
from V 2 to V 3 . 

It follows that only edges (x,y) € E' such that x G V 3 and y G Vi may not be shortest paths. If 
APS(1/') < Y.{u, v)£E' v) then there must be an edge {u^, xi) G E' such that U 3 G V 3 and vi G Vi and 
the edge {us, ni) is not a shortest path. It is not hard to verify that only paths of the form {us, X 2 ), {x 2 ,vi) 
such that both edges (rts, X 2 ) and (x 2 , ni) correspond to edges of G, could be shorter than the path {u^^vi). 
Let (rt 3 , X 2 ), {x 2 ,vi) be the shortest path from U 3 to vi. We get that N + oj{u 3 ,X 2 ) + N + oj{x 2 ,vi) = 
0 j'{u 3 ,X 2 ) + w'(x 2 , fi) < uj'{u 3 , vi) = 2N - aj{u 3 ,vi). So ui{u 3 , X 2 ) + a;(x 2 , fi) + oj{u 3 , ni) < 0 and G 
has a negative triangle. □ 

8 Extensions and Comments 

8.1 The distribution of centrality values 

What can we say about the centrality distribution? First we observe that the range of average distance 
W(r;) /n values is between D/nto D, where D is the diameter (maximum distance between a pair of points 
in V). To see the upper bound, note that the average of values that are at most D, is at most D. For the 
lower bound, let u and v be nodes such that dist(u, v) = D. Then for all h £ V, from triangle inequality, 
dist(rt, h) + dist(/i, v) > D, thus, W (h) > D. 

Lemma 8.1. The highest average distance value must satisfy 

maxW(n)/n > D/2 . 

vGV 

Proof. Consider the two nodes u and v such that dist(tf, v) = D. From triangle inequality, any point /i G C 
has dist(rt, h) + dist(/i, n) > D. Summing over h we obtain that W(tt) + W(r;) > nD. Therefore, either 
W(ii) or W(n) is at least nZ?/2. □ 

Lemma 8.2, If z = argmin^igy W(n) is the 1-median, then at least half the nodes satisfy W(r;) < 3 W( 2 :). 

Proof Take the median distance m{z) from z. Then the average distance from z is at least m{z)/2. Thus, 
n • m{z) < 2 W( 2 ;). Consider now a node v that is one of the n/2 closest to z. For any node u, dist(f, u) < 
dist( 2 ;,tt) + m{z). Therefore, 

W(n) = ^ dist(r;, u) < ^ dist(z, u) + nm{z) < nm{z) + W(z) < 3W( 2 ;) . 


□ 

Last we observe that it is easy to realize networks where there is a large spread of centrality values. At 
the extreme, consider a single point (node) that has distance D to n very tight cluster of n — 1 points. The 
points in the cluster have W(n) Ri I? whereas the isolated point has W(n) Ri nD. More generally, networks 
(or data sets) containing well separated clusters with different sizes would exhibit a spread in centrality 
values. 
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A side comment is that as a corollary of the proof of Lemma lSTTl we obtain that the all pairs sum in metric 
spaces can be estimated with CV e and good concentration by the scaled average of distances of 0(ne“^) 
pairs sampled uniformly at random - as established in ||2l. This is because there are at least n — 1 pairwise 
distances that are at least -D/2, since each point that is not an endpoint of the diameter is of distance at least 
D/2 from at least one of the endpoints. Since the maximum distance is D, this immediately implies our 
claim. Recall, however, that when we are restricted to using 0(e“^) single-source distance computations 
from a uniform sample of nodes, the estimates can have large CV, but a similar bound can still be obtained 
using our weighted sampling approach (see Corollary 11.31) . 

8.2 Limitation to distances 

We showed that any set of points V in any metric space can be “sparsified” in the sense that a weighted 
sample of size 0(e~^) allows us to estimate W (v) for any point v in the space. We refer to such a sample as 
a universal PPS sample, since it encapsulates a PPS sample of the entries in each row of the matrix. One can 
ask if we can obtain similar sparsification with respect to other nonnegative symmetric matrices. We first 
observe that in general, the size of a universal PPS sample may be r2(n): Consider a matrix Anxn so that for 
i G [n/2], A 2 i-i^ 2 i 3> 0 but all other entries are 0 (or close to 0). The average of each row is dominated by 
the other member of the pair (2i — 1, 2i), and therefore, any universal PPS sample must sample most points 
with probability close to 1. 

Such a matrix can not be realized with distances, as it violates the triangle inequality, but it can be real¬ 
ized when entries correspond to (absolute value) of inner products of n vectors in n-dimensional Euclidean 
space M”. In this case, the sampling question we ask is a well studied embedding problem ll^ . for which 
it is known that the size of a universal PPS sample (the terminology leverage scores is used) can be of size 
©((ie“^), where d is the dimension @|2T1. Intuitively, the gap between the universal PPS size between 
distances and inner products stems from the observation that being “far” (large distance) is something that 
usually applies with respect to many nodes, whereas being “close” (large inner product) is a local property. 

8.3 Weighted centrality 

Often different points v have different importance f3{v). In this case, we would like our centrality measure 
to reflect that by considering a weighted average of distances 

X;i/3(f)dist(xi,xj) 

Our results, and in particular, the sampling construction extend to the weighted setting. First, instead of 
uniform base probabilities 1/n, we use PPS probabilities according to /?(*)/ Ej PU) node i. Second, 
when considering distances and probabilities from a base node, we use weight equal to the product of 
/3{v) dist(M, v) (product of /3 and distance.). Third, in the analysis, we need to take quantiles/medians with 
respect to fd mass and not just the number of points. 

8.4 Adaptive (data dependent) sampling 

We showed that the number of samples needed to determine an approximate 1-median on graphs is 0(e“^ log n), 
where for each sample we perform a single-source distance computation. This bound is worst case which 
materializes when the 1-median z is such that all other points have W(m) = (1 -|- e) W( 2 ;). In this case, only 
the exact 1-median qualifies as an approximafe 1-median and also, since there are so many other points. 
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some are likely to have estimated W(tt) < W( 2 :) if we use a smaller sample. On realistic instances, how¬ 
ever, we would expect a larger separation between the 1-median and most other points. This would allow 
us to use fewer samples if we adaptively determine the sample size. Such an approach was proposed in fT\ 
to identify a node with approximate maximum marginal influence and similarly can be applied here for the 
1 -median. 


9 Conclusion 

Weighted samples are often used as compact summaries of weighted data. With weighted sampling, even 
of very skewed data, a PPS sample of size e~^ would provide us with good estimates with CV of 0(e) on 
the total sum of the population. The surprise factor of our result, which relies only on properties of metrics, 
is that we can design a single set of sampling probabilities, which we termed universal PPS, that forms a 
good weighted sample from the perspectives of any point in the metric space. Moreover, we do so in an 
almost lossless way in terms of the sample size to estimation quality tradeoffs. In particular, the sample 
size does not depend on the number of points n or the dimension of the space. Another perhaps surprising 
consequence of our results is that there is a rank-1 matrix that approximates the PPS probabilities of the full 
pairwise distances matrix. The approximation can be expressed as the outer product of two vectors, which 
can be computed using a linear number of distance computations. 
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